Evaluating the Signi cance of Sequence

نویسندگان

  • Qicheng Ma
  • Jason T. L. Wang
چکیده

Sdiscover is a tool capable of nding subsequences, possibly separated by arbitrarily long gaps, in a set of sequences. These subsequences are referred to as motifs. This paper proposes a method to evaluate the signiicance of the sequence motifs found by Sdis-cover. The method is based on the minimum description length principle and Shannon's coding theory. The equivalence of the proposed method to the Bayesian inference is also discussed. 1 Introduction As the Human Genome Project 4] is expected to complete in a few years, research focus has been shifted from sequencing the biological data to mining and interpreting these data 1, 7, 9]. The interesting patterns to be mined range from genes 3], to DNA or protein sequence motifs 2, 10], to protein and RNA structure motifs 5, 9]. In this paper, we consider the problem of evaluating the signiicance of sequence motifs found by our pattern matching tool, Sdiscover 10]. Given a set of sequences, the motifs of interest are in the regular ex

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compression , Signi cance and Accuracy

Stephen Muggleton The Turing Institute, 36 North Hanover Street, Glasgow G1 2AD, UK Ashwin Srinivasan The Turing Institute, 36 North Hanover Street, Glasgow G1 2AD, UK Michael Bain The Turing Institute, 36 North Hanover Street, Glasgow G1 2AD, UK Abstract Inductive Logic Programming (ILP) involves learning relational concepts from examples and background knowledge. To date all ILP learning syst...

متن کامل

Systematic and Fully Automated Identi cation of Protein Sequence Patterns

We present an efŽ cient algorithm to systematically and automatically identify patterns in protein sequence families. The procedure is based on the Splash deterministic pattern discovery algorithm and on a framework to assess the statistical signiŽ cance of patterns. We demonstrate its application to the fully automated discovery of patterns in 974 PROSITE families (the complete subset of PROSI...

متن کامل

Motion Differential SPIHT for Image Sequence Coding

Ef cient image sequence coding exploits both intraand interframe correlations. SPIHT is ef cient in intra-frame decorrelation for still images. Based on SPIHT, differential-SPIHT removes inter-frame redundancy by reusing the signi cance map of a SPIHT coded frame. The motion differential SPIHT (MD-SPIHT) automatically decides the coding methods for each frame, according to the inter-frame corre...

متن کامل

FBST Regularization and Model Selection

We show how the Full Bayesian Signi cance Test (FBST) can be used as a model selection criterion. The FBST was presented by Pereira and Stern [3842] as a coherent Bayesian signi cance test.

متن کامل

On the statistical signi cance of temporal ring patterns in multi-neuronal spike trains

Repeated occurrences of serial ring sequences of a group of neurons with xed time delays between neurons are observed in many experiments involving simultaneous recordings from multiple neurons. Such temporal patterns are potentially indicative of underlying microcircuits and it is important to know when a repeatedly occurring pattern is statistically signi cant. These sequences are typically i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000